The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
With the rise of AI in recent years and the increase in complexity of the models, the growing demand in computational resources is starting to pose a significant challenge. The need for higher compute power is being met with increasingly more potent accelerators and the use of large compute clusters. However, the gain in prediction accuracy from large models trained on distributed and accelerated systems comes at the price of a substantial increase in energy demand, and researchers have started questioning the environmental friendliness of such AI methods at scale. Consequently, energy efficiency plays an important role for AI model developers and infrastructure operators alike. The energy consumption of AI workloads depends on the model implementation and the utilized hardware. Therefore, accurate measurements of the power draw of AI workflows on different types of compute nodes is key to algorithmic improvements and the design of future compute clusters and hardware. To this end, we present measurements of the energy consumption of two typical applications of deep learning models on different types of compute nodes. Our results indicate that 1. deriving energy consumption directly from runtime is not accurate, but the consumption of the compute node needs to be considered regarding its composition; 2. neglecting accelerator hardware on mixed nodes results in overproportional inefficiency regarding energy consumption; 3. energy consumption of model training and inference should be considered separately - while training on GPUs outperforms all other node types regarding both runtime and energy consumption, inference on CPU nodes can be comparably efficient. One advantage of our approach is that the information on energy consumption is available to all users of the supercomputer, enabling an easy transfer to other workloads alongside a raise in user-awareness of energy consumption.
translated by 谷歌翻译
医学图像分割模型的性能指标用于衡量参考注释和预测之间的一致性。在开发此类模型中,使用了一组通用指标,以使结果更具可比性。但是,公共数据集中的分布与临床实践中遇到的案例之间存在不匹配。许多常见的指标无法衡量这种不匹配的影响,尤其是对于包含不确定,小或空参考注释的临床数据集。因此,可能无法通过此类指标来验证模型在临床上有意义的一致性。评估临床价值的维度包括独立于参考注释量的大小,考虑参考注释的不确定性,体积计和/或位置一致性的奖励以及对空参考注释正确分类的奖励。与普通的公共数据集不同,我们的内部数据集更具代表性。它包含不确定的,小或空的参考注释。我们研究了有关深度学习框架的预测的公开度量指标,以确定哪些设置共同指标可提供有意义的结果。我们将公共基准数据集进行比较而没有不确定,小或空参考注释。该代码将发布。
translated by 谷歌翻译
语义分割是医学图像计算中最受欢迎的研究领域之一。也许令人惊讶的是,尽管它可以追溯到2018年,但NNU-NET仍在为各种细分问题提供竞争性的开箱即用解决方案,并定期用作挑战挑战算法的开发框架。在这里,我们使用NNU-NET参与AMOS2022挑战,该挑战带有一套独特的任务:数据集不仅是有史以来最大的最大的数据集,而且拥有15个目标结构,而且竞争还需要提交的解决方案来处理这两种MRI和CT扫描。通过仔细修改NNU-NET的超参数,在编码器中添加剩余连接以及设计自定义后处理策略,我们能够实质上改进NNU-NET基线。我们的最终合奏在任务1(CT)的骰子得分为90.13,而任务2(CT+MRI)的骰子得分为89.06,在提供的培训案例中进行了5倍的交叉验证。
translated by 谷歌翻译
自动生物医学图像分析的领域至关重要地取决于算法验证的可靠和有意义的性能指标。但是,当前的度量使用通常是不明智的,并且不能反映基本的域名。在这里,我们提出了一个全面的框架,该框架指导研究人员以问题意识的方式选择绩效指标。具体而言,我们专注于生物医学图像分析问题,这些问题可以解释为图像,对象或像素级别的分类任务。该框架首先编译域兴趣 - 目标结构 - ,数据集和算法与输出问题相关的属性的属性与问题指纹相关,同时还将其映射到适当的问题类别,即图像级分类,语义分段,实例,实例细分或对象检测。然后,它指导用户选择和应用一组适当的验证指标的过程,同时使他们意识到与个人选择相关的潜在陷阱。在本文中,我们描述了指标重新加载推荐框架的当前状态,目的是从图像分析社区获得建设性的反馈。当前版本是在由60多个图像分析专家的国际联盟中开发的,将在社区驱动的优化之后公开作为用户友好的工具包提供。
translated by 谷歌翻译
在X射线游离电子激光器(XFELS)处的单粒子成像(SPI)特别适合于确定其本地环境中颗粒的3D结构。对于成功的重建,必须从大量获取的图案中分离出来的衍射模式。我们建议将此任务作为图像分类问题制定,并使用卷积神经网络(CNN)架构来解决它。开发了两个CNN配置:一个最大化F1分数的CNN配置和强调高召回的一个配置。我们还将CNN与期望最大化(EM)选择以及尺寸过滤结合起来。我们观察到,我们的CNN选择在我们之前的工作中使用的电子选择的功率谱密度函数的对比度较低。但是,基于CNN的选择的重建提供了类似的结果。将CNN引入SPI实验允许简化重建管道,使研究人员能够在飞行中对模式进行分类,并且因此,它们使他们能够严格控制其实验的持续时间。我们认为,在描述的SPI分析工作流程中提出基于非标准的人工智能(AI)解决方案可能对SPI实验的未来发展有益。
translated by 谷歌翻译
在医学图像中的对象的同时定位和分类,也称为医学对象检测,是高临床相关性,因为诊断决策通常依赖于物体的评级而不是例如像素。对于此任务,方法配置的繁琐和迭代过程构成了一个主要的研究瓶颈。最近,NNU-Net在巨大成功中解决了图像细分任务的挑战。在NNU-Net的议程之后,在这项工作中,我们系统化并自动化了医疗对象检测的配置过程。由此产生的自配置方法NNDetection,在没有任何手动干预到任意医学检测问题的情况下适应本身,同时实现结果腹板或优于现有技术。我们展示了NNDetection对两台公共基准,亚当和Luna16的有效性,并提出了关于综合方法评估的公共数据集的进一步医疗对象检测任务。代码是https://github.com/mic-dkfz/nndetection。
translated by 谷歌翻译
尽管自动图像分析的重要性不断增加,但最近的元研究揭示了有关算法验证的主要缺陷。性能指标对于使用的自动算法的有意义,客观和透明的性能评估和验证尤其是关键,但是在使用特定的指标进行给定的图像分析任务时,对实际陷阱的关注相对较少。这些通常与(1)无视固有的度量属性,例如在存在类不平衡或小目标结构的情况下的行为,(2)无视固有的数据集属性,例如测试的非独立性案例和(3)无视指标应反映的实际生物医学领域的兴趣。该动态文档的目的是说明图像分析领域通常应用的性能指标的重要局限性。在这种情况下,它重点介绍了可以用作图像级分类,语义分割,实例分割或对象检测任务的生物医学图像分析问题。当前版本是基于由全球60多家机构的国际图像分析专家进行的关于指标的Delphi流程。
translated by 谷歌翻译
Many problems in machine learning involve bilevel optimization (BLO), including hyperparameter optimization, meta-learning, and dataset distillation. Bilevel problems consist of two nested sub-problems, called the outer and inner problems, respectively. In practice, often at least one of these sub-problems is overparameterized. In this case, there are many ways to choose among optima that achieve equivalent objective values. Inspired by recent studies of the implicit bias induced by optimization algorithms in single-level optimization, we investigate the implicit bias of gradient-based algorithms for bilevel optimization. We delineate two standard BLO methods -- cold-start and warm-start -- and show that the converged solution or long-run behavior depends to a large degree on these and other algorithmic choices, such as the hypergradient approximation. We also show that the inner solutions obtained by warm-start BLO can encode a surprising amount of information about the outer objective, even when the outer parameters are low-dimensional. We believe that implicit bias deserves as central a role in the study of bilevel optimization as it has attained in the study of single-level neural net optimization.
translated by 谷歌翻译
By optimizing the rate-distortion-realism trade-off, generative compression approaches produce detailed, realistic images, even at low bit rates, instead of the blurry reconstructions produced by rate-distortion optimized models. However, previous methods do not explicitly control how much detail is synthesized, which results in a common criticism of these methods: users might be worried that a misleading reconstruction far from the input image is generated. In this work, we alleviate these concerns by training a decoder that can bridge the two regimes and navigate the distortion-realism trade-off. From a single compressed representation, the receiver can decide to either reconstruct a low mean squared error reconstruction that is close to the input, a realistic reconstruction with high perceptual quality, or anything in between. With our method, we set a new state-of-the-art in distortion-realism, pushing the frontier of achievable distortion-realism pairs, i.e., our method achieves better distortions at high realism and better realism at low distortion than ever before.
translated by 谷歌翻译